Dynamic clustering threshold reduces conformer ensemble size while maintaining a biologically relevant ensemble

نویسندگان

  • Austin B. Yongye
  • Andreas Bender
  • Karina Martínez-Mayorga
چکیده

Representing the 3D structures of ligands in virtual screenings via multi-conformer ensembles can be computationally intensive, especially for compounds with a large number of rotatable bonds. Thus, reducing the size of multi-conformer databases and the number of query conformers, while simultaneously reproducing the bioactive conformer with good accuracy, is of crucial interest. While clustering and RMSD filtering methods are employed in existing conformer generators, the novelty of this work is the inclusion of a clustering scheme (NMRCLUST) that does not require a user-defined cut-off value. This algorithm simultaneously optimizes the number and the average spread of the clusters. Here we describe and test four inter-dependent approaches for selecting computer-generated conformers, namely: OMEGA, NMRCLUST, RMS filtering and averaged-RMS filtering. The bioactive conformations of 65 selected ligands were extracted from the corresponding protein:ligand complexes from the Protein Data Bank, including eight ligands that adopted dissimilar bound conformations within different receptors. We show that NMRCLUST can be employed to further filter OMEGA-generated conformers while maintaining biological relevance of the ensemble. It was observed that NMRCLUST (containing on average 10 times fewer conformers per compound) performed nearly as well as OMEGA, and both outperformed RMS filtering and averaged-RMS filtering in terms of identifying the bioactive conformations with excellent and good matches (0.5 < RMSD < 1.0 A). Furthermore, we propose thresholds for OMEGA root-mean square filtering depending on the number of rotors in a compound: 0.8, 1.0 and 1.4 for structures with low (1-4), medium (5-9) and high (10-15) numbers of rotatable bonds, respectively. The protocol employed is general and can be applied to reduce the number of conformers in multi-conformer compound collections and alleviate the complexity of downstream data processing in virtual screening experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...

متن کامل

Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering

Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...

متن کامل

انتخاب خوشه‌های اولیه به کمک الگوریتم‌های هوشمند برای مشارکت در خوشه‌بندی ترکیبی

Most of the recent studies have tried to create diversity in primary results and then applied a consensus function over all the obtained results to combine the weak partitions. In this paper a clustering ensemble method is proposed which is based on a subset of primary clusters. The main idea behind this method is using more stable clusters in the ensemble. The stability is applied as a goodnes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2010